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Motivated by recent work studying massive imaging data in the 
neuroimaging literature, we propose multivariate varying coefficient 
models (MVCM) for modeling the relation between multiple func- 
tional responses and a set of covariates. We develop several statisti- 
cal inference procedures for MVCM and systematically study their 
theoretical properties. We first establish the weak convergence of the 
local linear estimate of coefficient functions, as well as its asymptotic 
bias and variance, and then we derive asymptotic bias and mean in- 
tegrated squared error of smoothed individual functions and their 
uniform convergence rate. We establish the uniform convergence rate 
of the estimated covariance function of the individual functions and 
its associated eigenvalue and eigenfunctions. We propose a global test 
for linear hypotheses of varying coefficient functions, and derive its 
asymptotic distribution under the null hypothesis. We also propose 
a simultaneous confidence band for each individual effect curve. We 
conduct Monte Carlo simulation to examine the finite-sample perfor- 
mance of the proposed procedures. We apply MVCM to investigate 
the development of white matter diffusivities along the genu tract of 
the corpus callosum in a clinical study of neurodevelopment. 

1. Introduction. With modern imaging techniques, massive imaging data 
can be observed over both time and space [4, 17, 19, 25, 37, 41]. Such imag- 
ing techniques include functional magnetic resonance imaging (fMRI), elec- 
troencephalography (EEG), diffusion tensor imaging (DTI), positron emis- 
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sion tomography (PET) and single photon emission-computed tomography 
(SPECT) among many other imaging techniques. See, for example, a recent 
review of multiple biomedical imaging techniques and their applications in 
cancer detection and prevention in Fass [17]. Among them, predominant 
functional imaging techniques including fMRI and EEG have been widely 
used in behavioral and cognitive neuroscience to understand functional seg- 
regation and integration of different brain regions in a single subject and 
across different populations [18, 19, 29]. In DTI, multiple diffusion prop- 
erties are measured along common major white matter fiber tracts across 
multiple subjects to characterize the structure and orientation of white mat- 
ter structure in human brain in vivo [2, 3, 54]. 

A common feature of many imaging techniques is that massive functional 
data are observed/calculated at the same design points, such as time for 
functional images (e.g., PET and fMRI). As an illustration, we present two 
smoothed functional data as an illustration and a real imaging data in Sec- 
tion 6, that we encounter in neuroimaging studies. First, we plot two diffu- 
sion properties, called fractional anisotropy (FA) and mean diffusivity (MD), 
measured at 45 grid points along the genu tract of the corpus callosum [Fig- 
ure 1(a) and (b)] from 40 randomly selected infants from a clinical study 




(a) (b) 
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(c) (d) 

Fig. 1. Representative functional neuroimaging data: (a) and (b) FA and MD along the 
genu tract of the corpus callosum from 40 randomly selected infants; and (c) and (d) the es- 
timated hemodynamic response functions (HRF) corresponding to two stimulus categories 
from 14 subjects. 
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of neurodevelopment with more than 500 infants. Scientists are particularly 
interested in delineating the structure of the variability of these functional 
FA and MD data and their association with a set of covariates of interest, 
such as age. We will systematically investigate the development of FA and 
MD along the genu of the corpus callosum tract in Section 6. Second, we 
consider the BOLD fMRI signal, which is based on hemodynamic responses 
secondary to neural activity. We plot the estimated hemodynamic response 
functions (HRF) corresponding to two stimulus categories from 14 randomly 
selected subjects at a selected voxel of a common template space from a clin- 
ical study of Alzheimer's disease with more than 100 infants. Although the 
canonical form of the HRF is often used, when applying fMRI in a clini- 
cal population with possibly altered hemodynamic responses [Figure 1(c) 
and (d)], using the subject's own HRF in fMRI data analysis may be advan- 
tageous because HRF variability is greater across subjects than across brain 
regions within a subject [1, 33]. We are particularly interested in delineating 
the structure of the variability of the HRF and their association with a set 
of covariates of interest, such as diagnostic group [34]. 

A varying-coefficient model, which allows its regression coefficients to vary 
over some predictors of interest, is a powerful statistical tool for addressing 
these scientific questions. Since it was systematically introduced to statisti- 
cal literature by Hastie and Tibshirani [24], many varying-coefficient mod- 
els have been widely studied and developed for longitudinal, time series 
and functional data [12, 13, 15, 23, 26-28, 38, 44, 47, 51]. However, most 
varying-coefficient models in the existing literature are developed for univari- 
ate response. Let yi{s) = (yii(s), . . . ,yij{s)Y' be a J-dimensional functional 
response vector for subject z = l,...,n, and Xj be its associated p x 1 
vector of covariates of interest. Moreover, s varies in a compact subset of 
Euclidean space and denotes the design point, such as time for functional 
images and voxel for structural and functional images. For notational sim- 
plicity, we assume s G [0, 1], but our results can be easily extended to higher 
dimensions. A multivariate varying coefficient model {MVCM) is defined as 



where Bj{s) = . . . , bjp{s)) is a p x 1 vector of functions of s, eij{s) 

are measurement errors, and i]ij{s) characterizes individual curve variations 
from -x^Bj{s). Moreover, {r]ij{s):s G [0,1]} is assumed to be a stochastic 
process indexed by s S [0, 1] and used to characterize the within-curve de- 
pendence. For image data, it is typical that the J functional responses Yi{s) 
are measured at the same location for all subjects and exhibit both the 
within-curve and between-curve dependence structure. Thus, for ease of no- 
tation, it is assumed throughout this paper that yj(s) was measured at the 
same M location points si = < S2 < • • • < sa/ = 1 for all i. 
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Most varying coefficient models in the existing literature coincide model 
(1.1) with J = 1 and without the within-curve dependence. Statistical infer- 
ences for these varying coefficient models have been relatively well studied. 
Particularly, Hoover et al. [26] and Wu, Chiang and Hoover [48] were among 
the first to introduce the time- varying coefficient models for analysis of lon- 
gitudinal data. Recently, Fan and Zhang [15] gave a comprehensive review 
of various statistical procedures proposed for many varying coefficient mod- 
els. It is of particular interest in data analysis to construct simultaneous 
confidence bands (SCB) for any linear combination of Bj instead of point- 
wise confidence intervals and to develop global test statistics for the general 
hypothesis testing problem on Bj. For univariate varying coefficient models 
without the within-curve dependence. Fan and Zhang [14] constructed SCB 
using the limit theory for the maximum of the normalized deviation of the 
estimate from its expected value. Faraway [16], Chiou, Miiller and Wang [8], 
and Cardot [5] proposed several varying coefficient models and their asso- 
ciated estimators for univariate functional response, but they did not give 
functional central limit theorem and simultaneous confidence band for their 
estimators. It has been technically difficult to carry out statistical inferences 
including simultaneous confidence band and global test statistic on Bj in 
the presence of the within-curve dependence. 

There have been several recent attempts to solve this problem in various 
settings. For time series data, which may be viewed as a case with n = 1 
and M— 7- oo, asymptotic SCB for coefficient functions in varying coefficient 
models can be built by using local kernel regression and a Gaussian ap- 
proximation result for nonstationary time series [52]. For sparse irregular 
longitudinal data. Ma, Yang and Carroll [35] constructed asymptotic SCB 
for the mean function of the functional regression model by using piecewise 
constant spline estimation and a strong approximation result. For functional 
data, Degras [9] constructed asymptotic SCB for the mean function of the 
functional linear model without considering any covariate, while Zhang and 
Chen [51] adopted the method of "smoothing first, then estimation" and 
propose a global test statistic for testing Bj, but their results cannot be 
used for constructing SCB for Bj. Recently, Cardot et al. [6], Cardot and 
Josserand [7] built asymptotic SCB for Horvitz-Thompson estimators for 
the mean function, but their models and estimation methods differ signifi- 
cantly from ours. 

In this paper, we propose an estimation procedure for the multivariate 
varying coefficient model (1.1) by using local linear regression techniques, 
and derive a simultaneous confidence band for the regression coefficient func- 
tions. We further develop a test for linear hypotheses of coefficient functions. 
The major aim of this paper is to investigate the theoretical properties of 
the proposed estimation procedure and test statistics. The theoretical de- 
velopment is challenging, but of great interest for carrying out statistical 
inferences on Bj. The major contributions of this paper are summarized 
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as follows. We first establish the weak convergence of the local linear esti- 
mator of Bj, denoted by Bj, by using advanced empirical process methods 
[31, 42]. We further derive the bias and asymptotic variance of Bj. These re- 
sults provide insight into how the direct estimation procedure for Bj using 
observations from all subjects outperforms the estimation procedure with 
the strategy of "smoothing first, then estimation." After calculating Bj, we 
reconstruct all individual functions r]ij and establish their uniform conver- 
gence rates. We derive uniform convergence rates of the proposed estimate 
for the covariance matrix of rjij and its associated eigenvalue and eigenvec- 
tor functions by using related results in Li and Hsing [32]. Using the weak 
convergence of the local linear estimator of Bj, we further establish the 
asymptotic distribution of a global test statistic for linear hypotheses of the 
regression coefficient functions, and construct an asymptotic SCB for each 
varying coefficient function. 

The rest of this paper is organized as follows. In Section 2, we describe 
MVCM and its estimation procedure. In Section 3, we propose a global test 
statistic for linear hypotheses of the regression coefficient functions and con- 
struct an asymptotic SCB for each coefficient function. In Section 4, we dis- 
cuss the theoretical properties of estimation and inference procedures. Two 
sets of simulation studies are presented in Section 5 with the known ground 
truth to examine the finite sample performance of the global test statistic 
and SCB for each individual varying coefficient function. In Section 6, we use 
MVCM to investigate the development of white matter diffusivities along the 
genu tract of the corpus callosum in a clinical study of neurodevelopment. 

2. Estimation procedure. Throughout this paper, we assume that ei{s) = 
(eii(s), . . . ,eij(s))^ and ?7j(s) = (?7ii(s), . . .,r]ij{s))'^ are mutually indepen- 
dent, and r/j(s) and £i(s) are independent and identical copies of SP(0,S^) 
and SP(0, S^), respectively, where SP(//, S) denotes a stochastic process vec- 
tor with mean function fj,{t) and covariance function S(s,t). Moreover, £i{s) 
and £i{t) are assumed to be independent for s^t, and ^^(s,*) takes the 
form of Ss{t)l{s = t), where ^^(t) = (se,jj'(i)) is a J x J matrix of functions 
of t and l(-) is an indicator function. Therefore, the covariance structure of 
yi(s), denoted by T,y{s,t), is given by 

(2.1) = Cov(yi(s),y,(t)) = + = t). 

2.1. Estimating varying coefficient functions. We employ local linear re- 
gression [11] to estimate the coefficient functions Bj. Specifically, we apply 
the Taylor expansion for Bj[sm) at s as follows: 

(2.2) Bj{Sm) » Bj{s) + Bj{,s){Sm - S)= Aj{s)zh-^^{Sm - s), 

where Zfi{sm — s) = (1, {sm — s)/K)^ and Aj{s) = [Bj{s)hijBj{s)\ is a p x 2 
matrix, in which Bj{s) = {bji{s), . . . ,bjp{s))^ is a p x 1 vector and bji{s) = 
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dbji{s)/ds for I = l,...,p. Let K(s) be a kernel function and Kh{s) = 
h~^K{s/h) be the rescaled kernel function with a bandwidth h. We esti- 
mate Aj{s) by minimizing the following weighted least squares function: 

n M 

(2-3) X] X] ~ Aj{s)zh^^{sm - s)fKh^.{sm - s). 

i=l m=l 

Let us now introduce some matrix operators. Let a*^^ = aa^ for any vector 
a and D be the Kronecker product of two matrices C and D. For an Mi x 
M2 matrix C = (cji), denote vec(C) = (cn, . . .,cim2, • • • ,cmi1,. • -^CMiNhV ■ 
Let Aj[s) be the minimizer of (2.3). Then 

n M 

(2.4) vec(ij(s)) = S(s,/iij)~^^ ^ Khij{sm - s)[xi (g) z^^^. (s„, - s)]yij(sm), 

i=l m=l 

where S(s, /iy) = EHi E^=i (^m - s)[xf ® z^,^, (s^ - s)®2]. Thus, we 
have 

(2.5) Bj{s) = (6,i(s), . . .,bjp{s)f = [Ip C3 (1, 0)] vec(i,(s)), 

where Ip is a p x p identity matrix. 

In practice, we may select the bandwidth hy by using leave-one-curve- 
out cross-validation. Specifically, for each j, we pool the data from all n 
subjects and select a bandwidth hij, denoted by hij, by minimizing the 
cross-validation score given by 

n M 

(2.6) CV(/iij) = (nM)-i^ ^[2/i,(s^) -xf4(s^,/ii,)(-*)]2, 

i=l m=l 

where Bj{s, hij)^~'^^ is the local linear estimator of Bj{s) with the bandwidth 
hij based on data excluding all the observations from the ith subject. 

2.2. Smoothing individual functions. By assuming certain smoothness 
conditions on r]ij{s), we also employ the local linear regression technique to 
estimate all individual functions r]ij{s) [11, 38, 43, 45, 49, 51]. Specifically, 
we have the Taylor expansion for r]ij{sm) at s, 

(2.7) r]ij{sm) ~ djj(s)^Zft,2^.(sm - s), 

where djj(s) = {riij{s),h2jfjij{s))'^ is a 2 x 1 vector. We develop an algo- 
rithm to estimate djj(s) as follows. For each i and j, we estimate djj(s) by 
minimizing the weighted least squares function: 

AI 

(2.8) ^[yij{s-m) -xfSj(Sm) - dij{s)^Zh2^{Sm - s)fKh^^{Sm - s). 
m=l 
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Then, r]ij{s) can be estimated by 
fjij{s) = (l,0)dij(s) 

M 

= X] ^h2j{Sm - s)[yij{Sm) - xf Sj(Sm)], 
m=l 

where Kfi^.{s) are the empirical equivalent kernels and djj(s) is given by 

" M 

_m=l 

M 

X ^ i^h.2j (^m - S)zh2j (■5m " s) bij (^m) " xf 5j (s^)] • 
m=l 

Finally, let Sij be the smoother matrix for the jth measurement of the ith 
subject [11], we can obtain 

(2.10) f)ij = {f]ij{si),...,fiij{sM)) =SijRij, 

where Rij = {yij{si) - xjBj{si), . . .,yij{sM) - xf Sj(sm))^- 

A simple and efficient way to obtain h2j is to use generalized cross- 
validation method. For each j, we pool the data from all n subjects and 
select the optimal bandwidth /i2j , denoted by /i2j , by minimizing the gener- 
alized cross-validation score given by 



(2.11) GCV(/i2j) 



Rfjihi — Sijy {Im — Sij)Ri 
^ [l-M~Hi{S,,)Y 



i=l 

Based on /i2j, we can use (2.9) to estimate r]ij(s) for all i and j. 

2.3. Functional principal component analysis. We consider a spectral de- 
composition of S^(s,i) = {T,.rijj'{s,t)) and its approximation. According to 
Mercer's theorem [36], if S^(s,t) is continuous on [0, 1] x [0, 1], then E^jj(s,t) 
admits a spectral decomposition. Specifically, we have 

oo 

(2.12) J:rj,n{s,t)=J2^jiM^)Mt) 

1=1 

for j = 1, . . . , J, where Aji > Xj2 > • • • > are ordered values of the eigen- 
values of a linear operator determined by ^-^jj with Yl'i^i ^ji < ^ 
the ■ipji{t)^s are the corresponding orthonormal eigenfunctions (or princi- 
pal components) [22, 32, 50]. The eigenfunctions form an orthonormal sys- 
tem on the space of square-integrable functions on [0,1], and r]ij{t) ad- 
mits the Karhunen-Loeve expansion as %(t) = Ylu^i^ijii^jii^)-: where S^iji = 
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/o Viji^)'^jii^) ds is referred to as the (j7)th functional principal component 
scores of the ith. subject. For each fixed the ■Fiji's are uncorrelated 

random variables with E{$^iji) = and E{^f-j) = Xji. Furthermore, for j ^ j' , 
we have 

oo oo 
1=1 V=l 

After obtaining fi^{s) = {fjii{s), . . . ,fjij{s)Y' , we estimate S^(s,t) by using 
the empirical covariance of the estimated i7j(s) as follows: 

n 

tn{s,t) = {n-p)-'Y,f,i{s)fi,{tf. 

i=l 

Following Rice and Silverman [39], we can calculate the spectral decompo- 
sition of Tir^jj{s,t) for each j as follows: 

(2.13) tnjj{s,t) = '^Xjiipji{s)'>pji{t), 

I 

where Xji > Xj2 > ••• > are estimated eigenvalues and the ijjji{t)^s are 
the corresponding estimated principal components. Furthermore, the (j, /)th 
functional principal component scores can be computed using S^iji = 
Ylm=i Vij{sm)ipji{sm)ism - Sm-i) for i = 1, . . . , n. We further show the uni- 
form convergence rate of S^(s,t) and its associated eigenvalues and eigen- 
functions. This result is useful for constructing the global and local test 
statistics for testing the covariate effects. 

3. Inference procedure. In this section, we study global tests for lin- 
ear hypotheses of coefficient functions and SCB for each varying coefficient 
function. They are essential for statistical inference on the coefficient func- 
tions. 

3.1. Hypothesis test. Consider the linear hypotheses of B(s) as follows: 

(3.1) i?o:Cvec(B(s)) = bo(s) for ah s vs. i^i : C vec(B(s)) / bo(s), 

where B(s) = [Bi{s), . . . , Bj{s)], C is a r x matrix with rank r and bo(s) 
is a given r x 1 vector of functions. Define a global test statistic Sn as 

(3.2) Sn= [ d(s)^[C(S^(s,s)®0^^)C^]~^d(s)ds, 

Jo 

where Qx = E7=i^f^ and d(s) = Cvec(B(s) - bias(B(s))) - bo(s). 
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To calculate Sn, we need to estimate the bias of Bj{s) for all j. Based on 
(2.5), we have 

bias(i3j(s)) 
(3.3) 

/ n 
X VeC S(s,/lij)~^^ ^ Kh^^{Sm - s)[yLi ® Zh^^{Sm - s)] 
\ i=l m=l 

By using Taylor's expansion, we have 

Bj{s„,) - Aj{s)zh,^{Sm - S) ~ 2-^Bj{s){Sm - s)^ + 6"^ B j{s){Sm " s)^ 

where Bj{s) = (f B j{s) / ds^ and Bj{s) = (fBj{s) /ds'^ . Following the pre- 
asymptotic substitution method of Fan and Gijbels [11], we replace Bj{sm) — 

Aj{s)zhT^.{sm-s) by 2^^Bj{s){sm-s)'^ + 6~'^Bj{s){sm.-s)'^, in which Bj{s) 

and Bj(s) are estimators obtained by using local cubic fit with a pilot band- 
width selected in (2.6). 

It will be shown below that the asymptotic distribution of Sn is quite 
complicated, and it is difficult to directly approximate the percentiles of 
Sn under the null hypothesis. Instead, we propose using a wild bootstrap 
method to obtain critical values of Sn- The wild bootstrap consists of the 
following three steps: 

Step 1. Fit model (1.1) under the null hypothesis Hq, which yields 
B*{sni), vloism) and e*o(^m) for i = 1,. . . ,n and m = 1, . . . ,M. 

Step 2. Generate a random sample r-^^^ and Ti{sni)^^^ from a A^(0, 1) 
generator for i= 1, . . . , n and ?ti = 1, . . . , M and then construct 

y.(s„)(5) = B*{s)'^^i + rl'^l^isrn) + ri(s„)(^?)£*o(sm)- 

Then, based on Yi{sm)^^\ we recalculate B(s)(^\ bias(B(s)(^)) and d(s)(^) = 
Cvec(B(s)(3) -bias(B(s)(3)))-bo(s). We also note that C vec(B(s)(»)) ^ bo 
and C vec(bias(B(s)(^^)) 0. Thus, we can drop the term bias(B(s)(^^) in 
d(s)(3) for computational efficiency. Subsequently, we compute 

5^5) =n rd(s)(^')^[C(S^(s,s)0A^i)C^]~M(s)(3)ds. 
Jo 
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Step 3. Repeat Step 2 G times to obtain {Sn^ : = 1, . . . , G}, and then 

calculate p = ^^^^ 1(5^^^ > If p is smaller than a pre-specified 
significance level a, say 0.05, then one rejects the null hypothesis Hq. 

3.2. Simultaneous confidence hands. Construction of SCB for coefficient 
functions is of great interest in statistical inference for model (1.1). For a 
given confidence level a, we construct SCB for each hji{s) as follows: 

(3.4) P(6j;'"(s) < hjiis) < 6^^'"(s) for ah se[OA]) = l-a, 

where b^l'^{s) and Vji'^{s) are the lower and upper limits of SCB. Specifically, 
it will be shown below that a 1 — a simultaneous confidence band for hji{s) 
is given as follows: 

(3.5) (b,i{s) - hi^{h,i{s)) - ^^^;b,i{s) - hias{bji{s)) + 

where Cji{a) is a scalar. Since the calculation of bji{s) and bias(6j;(s)) has 
been discussed in (2.5) and (3.3), the next issue is to determine Cji{a). 

Although there are several methods of determining Cji{a) including ran- 
dom field theory [40, 46], we develop an efficient resampling method to 
approximate Cji{a) as follows [30, 55]: 

• We calculate rij{sm) = yij{sm) ~ ^ Bj{sm) for all and m. 

• For (7 = 1, . . . , G, we independently simulate {t^^^^ : i = 1, . . . , n} from A^(0, 1) 
and calculate a stochastic process Gj{s)^^^ given by 

\/^[/p® (1,0)] 

(n M \ 

S(s,/lij)"^ ^rf^^ Kh,^{Sm - s)[:>Ci ^ Zh,^{Sm - s)]fij{Srn) . 

i=l m=l / 

• We calculate supsg^^i] |e/Gj (s)^^^ | for all g, where e/ be a p x 1 vector with 
the Ith element 1 and otherwise, and use their 1 — a empirical percentile 
to estimate Cji{a). 

4. Asymptotic properties. In this section, we systematically examine the 
asymptotic properties of B(s), fjijis), S^(s,t) and 5„ developed in Sections 2 
and 3. Let us first define some notation. Let Ur{K) = J fK{t) dt and Vr{K) = 
J t''K^{t)dt, where r is any integer. For any smooth functions f{s) and 

gis,t), define /(s) = dfis)/ds, f\s) = d''f{s)/ds\ J{s) = d^f{s)/ds^ and 
g^"-'^\s,t) = d""^^ g{s ,t) / d"' s d^t , where a and b are any nonnegative integers. 
Let H = diag(/iii,...,/iij), B(s) = [Si(s), . . . , Sj(s)], B(s) = [Si(s), . . . , 
Bj{s)] andB{s) = [Bi{s),...,Bj{s)], where Bj{s) = (bji{s), ... ,bjp{s))'^ is 
a p X 1 vector. Let S = {si, . . . ,sm}- 
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4.1. Assumptions. Throughout the paper, the following assumptions are 
needed to facihtate the technical details, although they may not be the 
weakest conditions. We need to introduce some notation. Let N{^, S) be 
a normal random vector with mean /i and covariance S. Let Qi{h,s) = 
f{l,h~^{u — s))®'^Kh{u — s)TT{u)du. Moreover, we do not distinguish the 
differentiation and continuation at the boundary points from those in the 
interior of [0, 1]. For instance, a continuous function at the boundary of [0, 1] 
means that this function is left continuous at and right continuous at 1. 

Assumption (CI). For aU j = 1, . . . , J, sup^^ E[\eij{sm)\'^] < oo for some 
q> A and all grid points Sm. 

Assumption (C2). Each component of {r/(s):sG [0,1]}, {ri{s)r]{t)'^ : 
(s, t) E [0, and {xrj^is) : s G [0, 1]} are Donsker classes. 

Assumption (C3). The covariate vectors Xj's are independently and 
identically distributed with E'xj = /j.^ and || 

^illoo ^ Assume that E\x- ] — 

Qx is positive definite. 

Assumption (C4). The grid points S = {sm,m = 1, . . . ,M} are ran- 
domly generated from a density function it{s). Moreover, 7r(s) > for all 
s £ [0, 1] and 7r(s) has continuous second-order derivative with the bounded 
support [0, 1]. 

Assumption (C4b) . The grid points S = {sm,m = 1, . . . , M} are pre- 
fixed according to 7r(s) such that 7r(s) ds = m/M for M >m>l. More- 
over, 7r(s) > for all s £ [0, 1] and 7r(s) has continuous second-order deriva- 
tive with the bounded support [0, 1] . 

Assumption (C5). The kernel function K{t) is a symmetric density 
function with a compact support [—1,1], and is Lipschitz continuous. More- 
over, < inf/ig(o,/io],s6[o,i] det(r2i(/i, s)), where /iq > is a small scalar and 
det(ili(/i, s)) denotes the determinant of Qi{h,s). 

Assumption (C6). All components of B(s) have continuous second 
derivatives on [0,1]. 

Assumption (C7). Both n and M converge to oo, maxjhij = o(l), 
Mhij — )• oo and maxj h]^^\loghij\^^'^^'^^ < M^~'^/'^^ for j = 1, . . . , J, where 
51 G (2,4). 

Assumption (C7b). Both n and M converge to oo, maxj hij = o(l), 
Mhij — )■ oo and log(M) = o{Mhij). There exists a sequence of 7^ > such 
that 7„ — ;> 00, maxj 71^/^7^"'^ /ij"-^ = o(l) and n^^/^7„ log(M) = o(l). 
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Assumption (C8). For all j, maxj 
(2, oo), maxj /i2j = o(l), and M/i2j — )• oo for j = 1, . . . , J. 

Assumption (C9) . The sample path of r]ij{s) has continuous second-or- 
der derivative on [0,1] and £'[sup^g[o,i] 11^(^)112^] <C)0 and Sjsupggp [||r/(s)||2+ 
||r)(s)||2]''^} < oo for some ri,r2 G (2,oo), where || • ||2 is the Euclidean norm. 

Assumption (C9b). £'[sup^g[o^i] ||»7(s)||2^] < oo for some n E (2,oo) 
and all components of have continuous second-order partial deriva- 

tives with respect to € [0, 1]^ and inf^g^ S^(s,s) > 0. 

Assumption (CIO). There is a positive fixed integer Ej < oo such that 
Xj^i >■■■> Xj^Ej > >^j,Ej+i > • • • > for j = 1, . . . , J. 

Remark. Assumption (CI) requires the uniform bound on the high- 
order moment of eij{sm) for all grid points Sm- Assumption (C2) avoids 
smoothness conditions on the sample path ?7(s), which are commonly as- 
sumed in the literature [9, 22, 51]. Assumption (C3) is a relatively weak 
condition on the covariate vector, and the boundedness of ||xj||2 is not es- 
sential. Assumption (C4) is a weak condition on the random grid points. 
In many neuroimaging applications, M is often much larger than n and for 
such large M , a regular grid of voxels is fairly well approximated by vox- 
els generated by a uniform distribution in a compact subset of Euclidean 
space. For notational simplicity, we only state the theoretical results for 
the random grid points throughout the paper. Assumption (C4b) is a weak 
condition on the fixed grid points. We will prove several key results for 
the fixed grid point case in Lemma 8 of the supplemental article [53]. The 
bounded support restriction on K{-) in Assumption (C5) is not essential 
and can be removed if we put a restriction on the tail of K[-). Assump- 
tion (C6) is the standard smoothness condition on B(s) in the literature 
[12, 13, 15, 23, 26-28, 38, 44, 47, 51]. Assumptions (C7) and (C8) on band- 
widths are similar to the conditions used in [10, 32]. Assumption (C7b) 
is a weak condition on n, M, hij and 7„ for the fixed grid point case. For 
instance, if we set 7^ = n^/^ log(M)~^~'^° for a positive scalar cq > 0, then we 
have n^/2^^-g/j-i ^ ^^1-9/2 fog(M)(i+'=o)('?-i)/i-i = o(l) and n-V2^„fog(M) = 

log(M)~'^o = 0(1). As shown in Theorem 1 below, if hij = 0{{nM)~^/^) and 
7„ = ,ii/2 log(M)-i-^o, ni/27^-''/i-i reduces to n^l^-il'^ log(M)(i+^o)(9-i) Af^/^ 
For relatively large q in Assumption (CI), n^/s-^/s fog(M)(i+^o)(9-i)Mi/5 
can converge to zero. Assumptions (C9) and (C3) are sufficient conditions 
of Assumption (C2). Assumption (C9b) on the sample path is the same as 
Condition C6 used in [32] . Particularly, if we use the method for estimating 
S^(s,s') considered in Li and Hsing [32], then the differentiability of 77(5) in 
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Assumption (C9) can be dropped. Assumption (CIO) on simple multiplic- 
ity of the first Ej eigenvalues is only needed to investigate the asymptotic 
properties of eigenfunctions. 

4.2. Asymptotic properties o/B(s). The following theorem establishes 
the weak convergence of {B{s),s G [0, 1]}, which is essential for constructing 
global test statistics and SCB for B(s). 

Theorem 1. Suppose that Assumptions (C1)~(C7) hold. The following 
results hold: 

(i) V^{vec(B(s) - B(s) - 0.5B(s)U2(K; s,H)H2[1 + Op(l)]) : s G [0, 1]} 
converges weakly to a centered Gaussian process G{-) with covariance matrix 
S^(s,s') , where Ox = -E[x®^] and \J2{K;s,'H) is a J x J diagonal 
matrix, whose diagonal elements will he defined in Lemma 5 in the Appendix. 

(ii) The asymptotic bias and conditional variance of Bj{s) given S for 
sG (0,1) are given by 0.bhyU2{K)Bj{s)[l + Op{l)] and n~^'Enjj{s,s)Q^^[l + 
Op(l)], respectively. 

Remarks. (1) The major challenge in proving Theorem l(i) is dealing 
with within-subject dependence. This is because the dependence between 
?7(s) and ri{s') in the newly proposed multivariate varying coefficient model 
does not converge to zero due to the within-curve dependence. It is worth 
noting that for any given s, the corresponding asymptotic normality of B(s) 
may be established by using related techniques in Zhang and Chen [51]. 
However, the marginal asymptotic normality does not imply the weak con- 
vergence of B(s) as a stochastic process in [0, 1], since we need to verify the 
asymptotic continuity of {B(s) : s G [0, 1]} to establish its weak convergence. 
In addition, Zhang and Chen [51] considered "smoothing first, then estima- 
tion," which requires a stringent assumption such that n = 0(M^/^). Read- 
ers are referred to Condition A. 4 and Theorem 4 in Zhang and Chen [51] 
for more details. In contrast, directly estimating B(s) using local kernel 
smoothing avoids such stringent assumption on the numbers of grid points 
and subjects. 

(2) Theorem l(ii) only provides us the asymptotic bias and conditional 
variance of Bj{s) given S for the interior points of (0,1). The asymptotic 
bias and conditional variance at the boundary points and 1 are given in 
Lemma 5. The asymptotic bias of Bj{s) is of the order hfj, as the one in 
nonparametric regression setting. Moreover, the asymptotic conditional vari- 
ance of Bj{s) has a complicated form due to the within-curve dependence. 
The leading term in the asymptotic conditional variance is of order n~^, 
which is slower than the standard nonparametric rate {nMhij)~^ with the 
assumption hij — )■ and Mhij — )• oo. 
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(3) Choosing an optimal bandwidth hy is not a trivial task for model 
(1.1). Generally, any bandwidth hy satisfying the assumptions hy — t- and 
Mhij —7- oo can ensure the weak convergence of {B(s) : s € [0, 1]}. Based on 
the asymptotic bias and conditional variance of B(s), we can calculate an 
optimal bandwidth for estimating B(s), hij = Op{{nM)~^^^). In this case, 



n ^hjj and (nAf) ^hij reduce to Op{n '^/^M ^/^) and (nM) respec- 
tively, and their contributions depend on the relative size of n over M. 



4.3. Asymptotic properties of fiij{s). We next study the asymptotic bias 
and covariance of fjij^s) as follows. We distinguish between two cases. The 
first one is conditioning on the design points in S, X, and rj. The other is 
conditioning on the design points in S and X. We define K*{{s — t)/h) = 
J K{u)K{u + {s-t)/h)du. 

Theorem 2. Under Assumptions (CI) and (C3)-(C8), the following 
results hold for all s G (0, L) : 

(a) Conditioning on {S,'K,ri), we have 
Bias[fjij{s)\S,r],Xi] 

= 0.5u2iK)[ii,j{s)hlj+xjBjis„,)hlj][l + Op{l)]+Op{n~^/^), 
Cov[7?ij(s),?7ij(i)|5,r7,Xi] 

= K*{{s -t)/h2Mtr\Mh2j)-^Op{l) -^fn^^^nMhjy^Opil). 

(b) The asymptotic bias and covariance of f]ij{s) conditioning on S and 
X are given by 

Bias[fi,,{s)\S,X]=0.5u2{K)x[Bjism)hlj[l + Opil)], 
Cov{fjij{s) - r]ij{s),f]ij{t) - r]ij{t)\S,X.) 

= [l + Opm0.25u2{Kfh%J:'^^is,t) 

+ K*{{S - t)/h2j)7T{t)-\Mh2j)-^Op{l) 

+ n ^x[il.j^^x.iY,r),jjis,t)]. 

(c) The mean integrated squared error (MISE) of all f]ij{s) is given by 

n ^1 

-1 ' 

n 



"'E / E{[mj{s)-riij{s)f\S}7:{s)ds 



(4.1) 

X <{ 0((M/i2j)~^) + / S^jj(s,s)7r(s)(is 



+ Q.2bul{K) j\Bj{sfnxBj{s)htj + ^5^(5, s)h%]7T{s) c?sj 
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(d) The optimal bandwidth for minimizing MISE (4-1) is given by 

(4.2) h2j=0{M-^/^). 

(e) The first order LPK reconstructions fjij{s) using h2j in (4-2) satisfy 

(4.3) sup \ fii,{s) - ri,j{s)\ = Op(|log(M)|^/2^f-2/5 ^ ^2^. ^ ^-1/2^ 
se[o,i] 

for i = 1, . . . ,n. 

Remark. Theorem 2 characterizes the statistical properties of smooth- 
ing individual curves rjij{s) after first estimating Bj{s). Conditioning on in- 
dividual curves r]ij{s), Theorem 2(a) shows that Bias[^jj(s)|5, X, 77] is associ- 
ated with 0.bu2{K)x[ Bj{sm)hij , which is the bias term of Bj{s) introduced 
in the estimation step, and 0.5u2{K)7iij{s)h2j is introduced in the smooth- 
ing individual functions step. Without conditioning on rjij{s), Theorem 2(b) 
shows that the bias of flij{s) is mainly controlled by the bias in the estima- 
tion step. The MISE of 57ij(s) in Theorem 2(c) is the sum of Op{n~^ + hfj) 
introduced by the estimation of Bj{s) and Op{{Mh2j)~^ ~'r h-tj) introduced 
by the reconstruction of r]ij{s). The optimal bandwidth for minimizing the 
MISE of fjijis) is a standard bandwidth for LPK. If we use the optimal 
bandwidth in Theorem 2(d), then the MISE of fjij[s) can achieve the order 
of n-^ + h\- + M-^/^ . 

4.4. Asymptotic properties ofT,r]is, t) . In this section, we study the asymp- 
totic properties of E^(s,i) and its spectrum decomposition. 

Theorem 3. (i) Under Assumptions (CI) and (C3)-(C9), it follows 
that 

sup \tr,{s,t) - S^(s,t)| = Op((M/i2j)~^ + h% + hi. + (logn/n)V2). 

(s,t)6[0,l]2 

(ii) Under Assumptions (CI) and (C3)-(C10), if the optimal bandwidths 
hmj for m = 1, 2 are used to reconstruct Bj(s) and fiij{s) for all j , then 
for I = 1, . . . ,Ej , we have the following results: 

(a) loiM^) - M^)? ds = Op{{Mh2j)-' + h% + h% + (logn/n)i/2); 

(b) |A,, - A,,| = Op((M/i2,)-i + /if, + /il, + (logn/n)V2). 

Remark. Theorem 3 characterizes the uniform weak convergence rates 
of S^(s,t), il^ji and A,; for all j. It can be regarded as an extension of The- 
orems 3.3-3.6 in Li and Hsing [32], which established the uniform strong 
convergence rates of these estimates with the sole presence of intercept and 
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J = 1 in model (1.1). Another difference is tfiat Li and Hsing [32] employed 
all cross products yijyik for j 7^ /c and then used the local polynomial ker- 
nel to estimate S^(s,t). As discussed in Li and Hsing [32], their approach 
can relax the assumption on the differentiability of the individual curves. In 
contrast, following Hall, Miiller and Wang [22] and Zhang and Chen [51], 
we directly fit a smooth curve to i]ij{s) for each i and estimate T,^(s,t) by 
the sample covariance functions. Our approach is computationally simple 
and can ensure that all S^jj(s,t) are positive semi-definite, whereas the 
approach in Li and Hsing [32] cannot. This is extremely important for high- 
dimensional neuroimaging data, which usually contains a large number of 
locations (called voxels) on a two-dimensional (2D) surface or in a 3D vol- 
ume. For instance, the number of M can number in the tens of thousands 
to millions, and thus it can be numerically infeasible to directly operate on 
E^(s,s'). 

We use Ij^(s,s') to denote the local linear estimator of S^(s,s') proposed 
in Li and Hsing [32]. Following the arguments in Li and Hsing [32], we can 
easily obtain the following result. 

Corollary 1. Under Assumptions (C1)-(C8) and (C9h), it follows 
that 

sup |S^(s,t)-S^(s,t)| =Op(/i?- + /iL + (logn/n)i/2). 

(s,t)e[o,i]2 

4.5. Asymptotic properties of the inference procedures. In this section, 
we discuss the asymptotic properties of the global statistic Sn and the critical 
values of SCB. Theorem 1 allows us to construct SCB for coefficient functions 
bji{s). It follows from Theorem 1 that 

(4.4) Mbjiis) - bji{s) - Bias(6,7(s))] Gji{s), 

where denotes weak convergence of a sequence of stochastic processes, 
and Gji{s) is a centered Gaussian process indexed by s G [0,1]. Therefore, 
let Xc{s) be a centered Gaussian process, and we have 

[C(S^(s, s) ® A-i)C^]-^/^d(s) ^ Xcis), 

(4-5) 

sup \Vn[bji{s) - bji{s) - Bias{bji{s))]\ sup \Gji{s)\. 

sG[0,l] sG[0,l] 

We define Cji{a) such that P(sup^g[o,i] 1^*^7(5)1 < Cji{a)) = 1 — a. Thus, the 
confidence band given in (3.5) is a 1 — a simultaneous confidence band for 
bji{s). 

Theorem 4. If Assumptions (C1)-(C9) are true, then we have 

(4.6) Sn^ f Xc{sfXc{s)ds. 

Jo 
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Remark. Theorem 4 is similar to Theorem 7 of Zhang and Chen [51]. 
Both characterize the asymptotic distribution of Sn- In particular, Zhang 
and Chen [51] delineate the distribution of Jq Xc{s)'^ Xc{s) ds as a x^-type 
mixture. All discussions associated with Theorem 7 of Zhang and Chen [51] 
are valid here, and therefore, we do not repeat them for the sake of space. 

We consider conditional convergence for bootstrapped stochastic pro- 
cesses. We focus on the bootstrapped process G [0,1]} as the 
arguments for establishing the wild bootstrap method for approximating the 
null distribution of Sn and the bootstrapped process {Gj{sY^^ : s G [0, 1]} are 
similar. 

Theorem 5. If Assumptions (C1)-(C9) are true, then Gj{s)^^\s) con- 
verges weakly to Gj{s) conditioning on the data, where Gj{s) is a centered 
Gaussian process indexed by s € [0, 1]. 

Remark. Theorem 5 validates the bootstrapped process of Gj{s)^^\ An 
interesting observation is that the bias correction for Bj{s) in constructing 
Gj{sY^^ is unnecessary. It leads to substantial computational saving. 

5. Simulation studies. In this section, we present two simulation example 
to demonstrate the performance of the proposed procedures. 

Example 1. This example is designed to evaluate the type I error rate 
and power of the proposed global test Sn using Monte Carlo simulation. In 
this example, the data were generated from a bivariate MVCM as fohows: 

(5.1) yij(Sm) = xfSj(Sm) +r?ij(Sm) +%(Sm) forj = l,2, 

where Sm ~ U[0,1], (eji(sm),ei2(sm))^ ~ iV((0, 0)^, ^^(sm) = diag(crf , o-|)) 
and Xj = (l,Xji,Xj2) for all i = 1, . . . ,n and m = 1, . . . ,M. Moreover, (xn, 
Xi2f ~ iV((0,0)'^,diag(l - 2-0-5,1 - 2-0-5) ^ 2-0-5(1, 1)®2) and 7?ij(s) = 
S.ijiipji{s)+S.ij2-ipj2{s), where ^ij7 ~ N{0,Xji) for j = 1,2 and 1 = 1,2. Further- 
more, Sm, ixii,Xi2), Ciii, Cii2, Ci2i, Ci22, eii(sm), and ej2(sm) are independent 
random variables. We set (An, A12, erf , A21, A22, 172) = (1.2, 0.6, 0.2, 1, 0.5, 0.1) 
and the functional coefhcients and eigenfunctions as follows: 

611 (s) = s\ buis) = (1 - sf, 6i3(s) = 4s(l - s) - 0.4; 
^11 (•s) = ■v/2sin(27rs), V'i2(s) = "V^ cos(27rs); 

621(5) = 5(s-0.5)^ 622(5) = s°■^ 623(5) = 4s(l-s)- 0.4; 
"021 (•s) = ■v/2cos(27rs), V'22(s) = V2sin(27rs). 

Then, except for (613(5), 623(5)) for all s, we fixed all other parameters at 
the values specified above, whereas we assumed (613(5), 623(5)) = c(45(l — 
5) — 0.4, 45(1 — s) — 0.4), where c is a scalar specified below. 
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Fig. 2. Plot of power curves. Rejection rates of Sn based on the wild bootstrap method 
are calculated at five different values of c (0, 0.1, 0.2, 0.3, and 0.4) for two sample sizes 
of n (100 and 200) subjects at 5% (green) and 1% (red) significance levels. 




We want to test the hypotheses Hq : 613(5) = 623(5) = for aU s against 
Hi : 613(5) 7^ or 623(5) 7^ for at least one 5. We set c = to assess the 
type I error rates for Sn, and set c = 0.1, 0.2, 0.3 and 0.4 to examine the 
power of Sn- We set M = 50, n = 200 and 100. For each simulation, the 
significance levels were set at a = 0.05 and 0.01, and 100 replications were 
used to estimate the rejection rates. 

Figure 2 depicts the power curves. It can be seen from Figure 2 that the 
rejection rates for Sn based on the wild bootstrap method are accurate for 
moderate sample sizes, such as (n = 100 or 200) at both significance levels 
(a = 0.01 or 0.05). As expected, the power increases with the sample size. 

Example 2. This example is used to evaluate the coverage probabili- 
ties of SCB of the functional coefficients 6(5) based on the wild bootstrap 
method. The data were generated from model (5.1) under the same pa- 
rameter values. We set n = 500 and M = 25, 50, and 75 and generated 
200 datasets for each combination. Based on the generated data, we calcu- 
lated SCB for each component of Bi{s) and 82(3). Table 1 summarizes the 
empirical coverage probabilities based on 200 simulations for a = 0.01 and 
a = 0.05. The coverage probabilities improve with the number of grid points 
M. When M = 75, the differences between the coverage probabilities and 
the claimed confidence levels are fairly acceptable. The Monte Carlo errors 
are of size v^O.95 x 0.05/200 w 0.015 for a = 0.05. Figure 3 depicts typi- 
cal simultaneous confidence bands, where n = 500 and M = 50. Additional 
simulation results are given in the supplemental article [53]. 

6. Real data analysis. The data set consists of 128 healthy infants (75 
males and 53 females) from the neonatal project on early brain development. 
The gestational ages of these infants range from 262 to 433 days, and their 
mean gestational age is 298 days with standard deviation 17.6 days. The 
DTIs and Tl-weighted images were acquired for each subject. For the DTIs, 
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Table 1 

Empirical coverage probabilities of 1 — a SCB for all 
components of Bi{-) and B2{-) based on 200 simulated data 



sets 


M 




6l2 




621 


f>22 


f)23 








a = 


0.05 






25 


0.915 


0.930 


0.945 


0.920 


0.915 


0.945 


50 


0.925 


0.940 


0.945 


0.930 


0.925 


0.950 


75 


0.945 


0.950 


0.955 


0.945 


0.945 


0.955 








a = 


0.01 






25 


0.985 


0.965 


0.985 


0.985 


0.990 


0.980 


50 


0.995 


0.980 


0.985 


0.985 


0.995 


0.985 


75 


0.990 


0.985 


0.990 


0.995 


0.990 


0.990 



the imaging parameters were as follows: the six noncollinear directions at 
the 6- value of 1000 s/mm^ with a reference scan (6 = 0), the isotropic voxel 
resolution = 2 mm, and the in-plane field of view = 256 mm in both direc- 
tions. A total of five repetitions were acquired to improve the signal-to-noise 
ratio of the DTIs. 

The DTI data were processed by two key steps including a weighted least 
squares estimation method [2, 54] to construct the diffusion tensors and a 
DTI atlas building pipeline [20, 56] to register DTIs from multiple subjects 



S5% SCB of till K'A SCD of 6ij 9i% SCB of dia 




Fig. 3. Typical simultaneous confidence bands with n = 500 and M = 50. The red solid 
curves are the true coefficient functions, and the blue dashed curves are the confidence 
bands. 
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to create a study specific unbiased DTI atlas, to track fiber tracts in tlie 
atlas space and to propagate them back into each subject's native space by 
using registration information. Subsequently, diffusion tensors (DTs) and 
their scalar diffusion properties were calculated at each location along each 
individual fiber tract by using DTs in neighboring voxels close to the fiber 
tract. Figure 1(a) displays the fiber bundle of the genu of the corpus cal- 
losum (GCC), which is an area of white matter in the brain. The GCC is 
the anterior end of the corpus callosum, and is bent downward and back- 
ward in front of the septum pellucidum; diminishing rapidly in thickness, it 
is prolonged backward under the name of the rostrum, which is connected 
below with the lamina terminalis. It was found that neonatal microstruc- 
tural development of GCC positively correlates with age and callosal thick- 
ness. 

The two aims of this analysis are to compare diffusion properties including 
FA and MD along the GCC between the male and female groups and to 
delineate the development of fiber diffusion properties across time, which is 
addressed by including the gestational age at MRI scanning as a covariate. 
FA and MD, respectively, measure the inhomogeneous extent of local barriers 
to water diffusion and the averaged magnitude of local water diffusion. We 
fit model (1.1) to the FA and MD values from all 128 subjects, in which Xj = 
(1, G, Age)^, where G represents gender. We then applied the estimation and 
inference procedures to estimate B(s) and calculate Sn for each hypothesis 
test. We approximated the p-value of Sn using the wild bootstrap method 
with G = 1000 replications. Finally, we constructed the 95% simultaneous 
confidence bands for the functional coefficients of Bj(s) for j = 1,2. 

Figure 4 presents the estimated coefficient functions corresponding to 1, 
G and Age associated with FA and MD (blue solid lines in all panels of 
Figure 4). The intercept functions [panels (a) and (d) in Figure 4] describe 
the overall trend of FA and MD. The gender coefficients for FA and MD 
in Figure 4(b) and (e) are negative at most of the grid points, which may 
indicate that compared with female infants, male infants have relatively 
smaller magnitudes of local water diffusivity along the genu of the corpus 
callosum. The gestational age coefficients for FA [panel (c) of Figure 4] are 
positive at most grid points, indicating that FA measures increase with age in 
both male and female infants, whereas those corresponding to MD [panel (f ) 
of Figure 4] are negative at most grid points. This may indicate a negative 
correlation between the magnitudes of local water diffusivity and gestational 
age along the genu of the corpus callosum. 

We statistically tested the effects of gender and gestational age on FA and 
MD along the GCC tract. To test the gender effect, we computed the global 
test statistic Sn = 144.63 and its associated p-value (p = 0.078), indicating 
a weakly significant gender effect, which agrees with the findings in panels 
(b) and (e) of Figure 4. A moderately significant age effect was found with 
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(e) (f) 

Fig. 4. Plot of estimated effects of intercept /(a), (d)], gender [{h), (e)], and age [{c), 
(f)y and their 95% confidence bands. The first three panels /(a), (b), (c)/ are for FA and 
the last three panels /(d), (e) and (f)/ are for MD. The blue solid curves are the estimated 
coefficient functions, and the red dashed curves are the confidence bands. 
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(c) 

Fig. 5. P/ot of the first 10 eigenvalues (a) anrf the first 3 eigenfunctions for FA (b) onrf 
MD (c). 

S'n = 929.69 (p- value < 0.001). This agrees with the findings in panel (f) 
of Figure 4, indicating that MD along the GCC tract changes moderately 
with gestational age. Furthermore, for FA and MD, we constructed the 95% 
simultaneous confidence bands of the varying-coefficients for Gj and age, 
(Figure 4). 

Figure 5 presents the first 10 eigenvalues and 3 eigenfunctions of t) 
for j = 1,2. The relative eigenvalues of '^rjjj defined as the ratios of the 
eigenvalues of J^rjjj{s,t) over their sum have similar distributional patterns 
[panel (a) of Figure 5]. We observe that the first three eigenvalues account for 
more than 90% of the total and the others quickly vanish to zero. The eigen- 
functions of FA corresponding to the largest three eigenvalues [Figure 5(b)] 
are different from those of MD [Figure 5(c)]. 

In the supplement article [53] , we further illustrate the proposed method- 
ology by an empirical analysis of another real data set. 
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APPENDIX 
We introduce some notation. We define 

n M 

TBj{h,s) = '^^^ Kh{Sm - s)[xi (g) •Zh{Sm - s)]xjBj{Sm), 
1=1 m=l 

n M 

Trjj{h,s) = ^ ^ Kh{Sm - s)[xi (g) Zhi-Sm - s)]r]ij{Sm), 
i=l m=l 

n M 

(A.l) Tsj{h,s) = X] ^/»(*"* ~ s)[xi (g)z/i(sm - s)]eij{sm), 

i=l m=l 

r (i^- s h) = ^2(^;g,/t)^ - ui{K;s,h)u-i{K]s,h) 
Uo{K;s,h)u2{K;s,h)—ui{K;s,h)'^^ 

Hh{Sm - s)= Kh{Sm - s)zh{Sm - s), 
AI 

Aj{s;r]i,hij) = ^ Hh^^{sm - s)r]ij{sm) 

m=l 
1 

Hh^^{u - s)r]ij{u)TT{u)du, 



where Ur{K; s, h) = h~^'i(u — sYKh{u — s) du for r > 0. Throughout the 
proofs, C/fc's stand for a generic constant, and it may vary from hne to hne. 

The proofs of Theorems 1-5 rely on the following lemmas whose proofs 
are given in the supplemental article [53]. 

Lemma 1. Under Assumptions (CI), (C3)-(C5) and (C7), we have that 
for each j , 



(A. 2) sup n -^/^hij\Tsj{hij,s)\=Op{JMhij\loghij\) = Op{Mhij). 
se[o,i] ^ 

Lemma 2. Under Assumptions (CI), (C4), (C5) and (C7), we have that 
for any r > and j , 



sup 

sG[0,l] 



sup 

sG[0,l] 



Kh,, {u - s)^^^^-^ d[UM{u) - n(n)] 
hi- 

(u - sY 

Kh-,j {u - s) — — Eij (n) dUM (u) 



Op{{Mhi,)-^/^^\loghij\), 



where IlMi') is the sampling distribution function based onS = {si, . . . , sm}; 
and n(-) is the distribution function of Sm- 
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Lemma 3. Under Assumptions (C2)-(C5), we have 



(A.3) 



sup 

sG[0,l] 



n 



-1/2 



:Op(l). 



Lemma 4. If Assumptions (CI) and (C3)~(C6) hold, then we have 
E[Bj{s)\S]-Bj{s) = 0.5hljU2{K)B,{s)[l + Op{l)], 

\(ii[Bj{s)\S]=n-'^j:r,,jj{s,s)n~^[l + Op{l)], 
where en{s) = Op{{Mhij)-^/^) with E[en{s)] =0. 



(A.4) 



Lemma 5. If Assumptions (CI) and (C3)-(C6) hold, then for s = or 
1, we have 



(A.5) 



E[Bj{s)\S]-Bj{s) = 0.5hljru{K;s,hij)Bj{s)[l + Op{l)], 



Var[Bj(s)|cS] = s)J73^^[l + Op(l)] 

Lemma 6. Under Assumptions (C1)-(C9), we have 
1 



supn 

{s,t) 



supn 

(s,t) 



supn 



supn 



i=l 

^eij{s)^7]ij{t) 

n 
i=l 

^Ar/ij(s)xi 



i=l 



Op(n-V2(iogn)i/2), 
Op(n-i/2(logn)V2), 
Op(n-V2(iogn)i/2), 
Op(n-V2(iog„)i/2), 



Lemma 7. Under Assumptions (C1)-(C9) , we have 



supn 

{s,t) 



^£ij{s)£ij{t) 



i=l 



0{{Mh2,)-' + (logn/n)V2) ^ 



We present only the key steps in the proof of Theorem 1 below. 

Proof of Theorem 1. Define 

U2{K; s, H) = diag(r„(K; s, hn), • • • , ru{K; s, hu)), 
Xn{s) = V^{B{s)-E[B{s)\S]}, 
Xn,j{s) = V^{B,{s) - E[Bjis)\S]}. 
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According to the definition of vec{Aj(s)), it is easy to see that 

(A.6) vec{Aj{s)) = T,{s, hijy^[TBj{hij, s) + Tsj{hij , s) + Tr^j{hij, s)], 
(A.7) Xn,j{s) = V^[Ip^{l,0)]J:{s,hijy^[Te,.j{hij,s) + T^j{hij,s)]. 
The proof of Theorem l(i) consists of two parts: 

• Part 1 shows that ^ynT,{s,hij)~^Tirj{hij,s) = Op{l) holds uniformly for 
ah sG [0,1] and j = l,..., J. 

• Part 2 shows that y/nL{s,hij)~^Trjj{hij , s) converges weakly to a Gaus- 
sian process G(-) with mean zero and covariance matrix S^jj(s, 

for each j. 

In part 1, we show that 
(A.8) (1, 0)]S(s, hij)~^Te,j{hij,s) = Op(l). 

It follows from Lemma 1 that 

n ( M ^ 

^-1/2^^. ^ I ^ Kh^^{Sm - s)zh,^{s)eij{Srn) > = Op(l) 

i=l I m=l J 

hold uniformly for all s € [0, 1]. It follows from Lemma 2 that 

(A.9) (nM)-^S(s, hij) = Ox ® ^iihij,s) + Op(l) 

hold uniformly for all s G [0,1]. Based on these results, we can finish the 
proof of (A.8). 

In part 2, we show the weak convergence of -^/n[Ip (g) {l,0)]T,{s,hij)~^ x 
Trfj{hij,s) for J = 1, ... , J. Part 2 consists of two steps. In Step 1, it follows 
from the standard central limit theorem that for each s £ [0, 1], 

(A.IO) V^[Ip » (1, 0)]S(s, hyy\j{hi,,s) iV(0, S^jj (s, s)^-^^), 

where — denotes convergence in distribution. 

Step 2 shows the asymptotic tightness of -^/^[Ip (8) (1, 0)]S(s, x 
^jihij,s). By using (A.9) and (A.l), ^J:{s,hij)-%j{hij,s)[l + Op{l)] 
can be approximated by the sum of three terms (I), (II) and (III) as follows: 

n 

(I) = Q-i^. ^ n-^(^hij,sy^Aj{s; Vi,hij), 

i=l 
n 

(II) = ^-1^. ^ ^i^^. ^ s)-^r]ij (s) 

i=l 

X / K{u){l,u) Tr{s + hiju) du, 

J max(— shr.^ , — 1) 
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(A.ll) 

(III) = ^-1^. ^ fi^^hij , s)-^ 



1=1 

|-min{(l-s)h-/,l) / i\ 

X / K(n) [r/ij(s + /iiju) -r?ij(s)] 

Jma.K{-sh-j,-l) V"/ 

X 7r(s + hiju) du. 

We investigate the three terms on the right-hand side of (A.ll) as follows. 
It follows from Lemma 3 that the first term on the right-hand side of (A.ll) 
converges to zero uniformly. We prove the asymptotic tightness of (II) as 
follows. Define 

n 

Xn,j{s) = n-i/2^!^3^^Xi <g) (1, 0)171 (/iij,s)-Sii(s) 

i=l 

min((l-s)/i-/,l) 

K{u){l,u)^ ■k{s + hiju) du. 

max(— s/ij^^.^ 

Thus, we only need to prove the asymptotic tightness of Xnj{s). The asymp- 
totic tightness of Xn,j{s) can be proved using the empirical process tech- 
niques [42]. It follows that 

^min((l-s)/i-/,l) 

{l,0)ni{hij,s)-^ / K{u){l,uyTr{s + hiju) du 

_ U2{K; s,hij)uo{K; s,hij) - ui{K; s, hijf + ojhij) _ _^ x 
U2{K;s,hij)uo{K;s,hij) -ui{K;s,hij)'^ + o{hij) ' 

Thus, Xn,j{s) can be simplified as 

n 

Xn,j{s) = [l + o{hij)\n-^/^^7]ij{s)a-^^^i. 

i=l 

We consider a function class = {/(s; x, r/. j) = ri^^^xry. j (s) : s G [0, 1]}. Due 
to Assumption (C2), is a P-Donsker class. 

Finally, we consider the third term (III) on the right-hand side of (A.ll). 
It is easy to see that (III) can be written as 

fmm{(l-s)/i^^.\l) 

"^/^y^Xj{?fa(g + /tijn) -r]ij{s)} 



K{u) 

max(— s/i-j^^. ,—1) 



n 

i=l 



X 7r(s + hiju) du. 
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Using the same argument of proving the second term (II), we can show the 
asymptotic tightness of n~^^'^Y17=i^iVij{^)- Therefore, for any hij — s-O, 



(A.12) sup 

se[o,i],|«|<i 



n 

-1/2 



^Xi{r/ij(s + hiju) - r]ij{s)} 



Op{l). 



n 

1=1 

It follows from Assumptions (C5) and (C7) and (A.12) that (III) converges 
to zero uniformly. Therefore, we can finish the proof of Theorem l(i). Since 
Theorem l(ii) is a direct consequence of Theorem l(i) and Lemma 4, we 
finish the proof of Theorem 1. □ 

Proof of Theorem 2. Proofs of parts (a)-(d) are completed by some 
straightforward calculations. Detailed derivation is given in the supplemental 
document. Here we prove part (e) only. Let KM,h{s) = KM{s/h)/h, where 
Km{s) is the empirical equivalent kernels for the first-order local polynomial 
kernel [11]. Thus, we have 

M 

(A.13) 



m=l 

M 



,h2j i^m s)[r]ij{Sm) +£ij{Sm) 



We define 



m=l 



= KM,h2j{Sm - s)£ij{Sm), 

m=l 
M 



m=l 

M 



ABj{s) = ^ KM,h2ji^rn - s)[Bj{Sm) - Bj{s,n)], 
m=l 

Aij(s) =eij{s)+Ar]ij{s)+xfABj{s). 
It follows from (A.13) that 

(A. 14) fiij{s)-r]ij{s) = Aij{s) =eij{s)+Arjij{s)+xjABj{s) 
It follows from Lemma 2 and a Taylor expansion that 



and 

sup \Ar]ij{s)\=Op{l) sup |f/ij(s)|/i^j.^^. 

sG[0,l] s6[0,l] 



28 H. ZHU, R. LI AND L. KONG 

Since y/n{Bj{-)- Bj{-)-0.5u2{K)'^ hl^Bj {■)[! + Op{l)]} weakly converges to a 
Gaussian process in i°°{[0, 1]) as n — ^ oo, y/n{Bj{-) - Bj{-) — 0.5u2{K)'^hlj x 
Bj{-)[1 + Op(l)]} is asymptotically tight. Thus, we have 

M 

ABijis) = -Yl ^mmM - s)^.hU2{Kfhl^Bj{Sm)[l + Op{l)] 
m=l 

M 

+ KmmM - s){0.5n2(i^)2/i2.^j(s„,)[l + Op(l)] 

m=l 

+ Bj{Sm) - Bj{Sm)}, 

sup \\ABj{s)\\ = Op(n-^/2) ^ Op{h%). 

sg[0,l] 

Combining these results, we have 
sup \i%{s) - Vij{s)\ = Op(|log(/i2j)r/'(M/i2j)-^/2 + /lif + hlj + n-i/2)_ 

sG[0,l] 

This completes the proof of part (e). □ 

Proof of Theorem 3. RecaU that fjij{s) = r]ij{s) + Aij{s), we have 

n n n 



n 



■1=1 



i=l 



i=l 



(A.15) 

7 t 7 t 

+ n'~^y^^Aij{s)riij{t) +n~^y^^r]ij{s)riij{t). 

i=l i=l 

This proof consists of two steps. The first step is to show that the first three 
terms on the right-hand side of (A.15) converge to zero uniformly for all 
{s,t) G [0, 1]^ in probability. The second step is to show the uniform conver- 
gence of n~^Y17=i''lij{s)''lij{'t) to S^(s,t) over {s,t) G [0,1]^ in probability. 
We first show that 



(A.16) supn"^ 

{s,t) 



YAij{s)i]ij{t) 



1=1 



Since 



YAij{s)i]ij{t) 



1=1 



(A.17) 



Y^ijisHjii) 



1=1 



+ 



Y^r]ij{s)r]ij{t) 



i=l 
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i=l 



it is sufficient to focus on tfie tliree terms on tlie riglit-hand side of (A. 17). 
Since 



|xj ABj{s)r]ij{t)\ < \\xi\\2 sup \\ABk{s)\\2 sup 

se[o,i] te[o,i] 



we liave 



n 



1=1 



< sup ||A5fe(s)||2n-l V||Xi||2|7?»j(t)| 

se[o,i] 



i=l 



Opin-'/^ + hl.). 



Similarly, we have 



n 



'^AT]ij{s)r]ij{t) 



1=1 



< V sup \Arjij{s)vij(.t)\ = OpQif]^) = Op(l). 
~t«,te[o,i] 



It follows from Lemma 6 that sup(g^^) n '^{ \ Yl'^=i \ = 0((logn/n)^/^). 

Similarly, we can show that supj-^ n"-"^! ^"^^^ Ajj(t)7yjj(s)| = Op(n~^/^ + 
hl^ + h% + {\ogn/n)^l^). 
We can show that 



(A.18) 
Note that 



sup 

{s,t) 



n 



i=l 



0,(n-i/2). 



\mj{si)m3{^i) - Vij{s2)vij(.t2)\ 

< 2(|si - S2I + |ti -t2|) sup \r]ij{s)\ sup |?7ij(s)| 

sG[0,l] sG[0,l] 

holds for any (si,ti) and (s2,t2)i the functional class {i]j{u)i]j{v) : {u,v) £ 
[0, 1]^} is a Vapnik and Cervonenkis (VC) class [31, 42]. Thus, it yields that 
(A.18) is true. 

Finally, we can show that 



(A.19) 



supn 



^Aij{s)A,,{t) 

i=l 

Op{iMh2j)-' + (log n/n)i/2 + + /ijf ). 
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With some calculations, for a positive constant Ci, we have 



^A,,-(s)Ay(t) 



i=l 



< Ci sup 



i=l 



+ 



^eij{s)Ariij{t) 



i=l 



+ 



1=1 

n 



+ 



1=1 



+ 



^eij{s)^fAB,{t) 

1=1 

Y,^jABj{s)AB,it)^. 



i=l 



It follows from Lemma 7 that 



supn 

{s,t) 



i=l 



Op{{Mh2j)-^ + {\ogn/nfl^) 



supn 

{s,t) 



^e,j(s)Ar/ij(t) 



i=l 



+ 



^Ar?,,(t)xfAi?,(s) 



i=l 



+ 



2^%(^)xfAi?,(t) 



i=l 



:Op((logn/n)V2). 



Since sup^g[o,i] \Ar]ij{s) \ = C2Sup^g[o,i] \rjij{s)\hlj, we have 

n 

supn-1 VA7?,,(s)A7?,j(t) =0(/iSf ). 
Furthermore, since sup^gjo.i] II^B(s)|| = Op{n~^^'^ + /i^), we have 



n 



^xfAi?,(s)Ai?,(t)x, 



j=i 



Note that the arguments for (A.16)-(A.19) hold for •) for any j ^ j' . 

Thus, combining (A.16)-(A.19) leads to Theorem 3(i). 

To prove Theorem 3(ii), we follow the same arguments in Lemma 6 of Li 
and Hsing [32]. For completion, we highlight several key steps below. We 
define 



(A.20) {A^j,j){s)= [j:r,jj{s,t)-j:r,jj{s,tMj,j{t)dt. 
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Following Hall and Hosseini-Nasab [21] and the Cauchy-Schwarz inequality, 
we have 

1 ^ 2 

r r />! -| 1/2 pi f-l \ 

<C2\ J {Atl;jj){sf ds + J J [^v,jj{s,t) -J:n,jj{s,t)f dsdtj 

( ^ 1/2 C fl ^ 1/2 

+ / [tr,jj{s,t) -T,^jj{s,t)f dsdt 
Jo Jo 



sup \T,ri.jj{s,t) — TiriJj{s,t)\, 
(s,i)g[0,l]2 

which yields Theorem 3(ii)(a). 

Using (4.9) in Hall, Miiller and Wang [22], we have 

<\ / [trjjj -J:r,,jj]{s,t)'ll)jj{s)'lpjj{t)dsdt 

Jo Jo 



+ o(^j\/\'4jj^j){sfds 



sup \Yirijj{s,t) — 
(s,t)6[0,l]2 

which yields Theorem 3(ii)(b). This completes the proof. □ 

Proof of Theorem 5. The proof of Theorem 5 is given in the supple- 
ment arctile [53]. □ 

Acknowledgments. The authors are grateful to the Editor Peter Biihlmann, 
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SUPPLEMENTARY MATERIAL 

Supplement to "Multivariate varying coefficient model for functional re- 
sponses" (DOT: 10.1214/12-AOS1045SUPP; .pdf). This supplemental mate- 
rial includes the proofs of all theorems and lemmas. 
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